[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

michalpaszkowski · 2024-06-10T17:27:42Z

This patch introduces lowering for the remaining llvm.x.with.overflow
intrinsics. The proposed implementation does not rely on OpIAddCarry
for sadd_with_overflow and equivalent for ssub_with_overflow.

Also, the patch changes the ordering of running the
SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare.

The changes push further the compilation of vector/scalar_access.cpp
and other DPC++ E2E tests with -O2 or higher optimization levels.

This change makes sure the preferresd switch condition int type size remains the same throughout CodeGen optimizations. The change fixes running several OpenCL applications with -O2 or higher opt levels, and fixes Basic/stream/stream_max_stmt_exceed.cpp DPC++ E2E test with -O2.

This patch introduces lowering for the remaining llvm.x.with.overflow intrinsics. The proposed implementation does not rely on OpIAddCarry for sadd_with_overflow and equivalent for ssub_with_overflow. Also, the patch changes the ordering of running the SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare. The changes push further the compilation of vector/scalar_access.cpp and other DPC++ E2E tests with -O2 or higher optimization levels.

llvmbot · 2024-06-10T17:28:16Z

@llvm/pr-subscribers-backend-spir-v

Author: Michal Paszkowski (michalpaszkowski)

Changes

This patch introduces lowering for the remaining llvm.x.with.overflow
intrinsics. The proposed implementation does not rely on OpIAddCarry
for sadd_with_overflow and equivalent for ssub_with_overflow.

Also, the patch changes the ordering of running the
SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare.

The changes push further the compilation of vector/scalar_access.cpp
and other DPC++ E2E tests with -O2 or higher optimization levels.

Full diff: https://github.com/llvm/llvm-project/pull/95012.diff

7 Files Affected:

(modified) llvm/lib/Target/SPIRV/SPIRVISelLowering.h (+5)
(modified) llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp (+75-27)
(modified) llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp (+1-1)
(modified) llvm/test/CodeGen/SPIRV/assume.ll (+4-4)
(added) llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll (+54)
(modified) llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll (+1-1)
(added) llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll (+18)

diff --git a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
index 6fc200abf4627..77356b7512a73 100644
--- a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
+++ b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
@@ -68,6 +68,11 @@ class SPIRVTargetLowering : public TargetLowering {
   // extra instructions required to preserve validity of SPIR-V code imposed by
   // the standard.
   void finalizeLowering(MachineFunction &MF) const override;
+
+  MVT getPreferredSwitchConditionType(LLVMContext &Context,
+                                      EVT ConditionVT) const override {
+    return ConditionVT.getSimpleVT();
+  }
 };
 } // namespace llvm
 
diff --git a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
index 7bee87d7204ed..4c3b6a6a24737 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
@@ -342,26 +342,70 @@ static void lowerFunnelShifts(IntrinsicInst *FSHIntrinsic) {
   FSHIntrinsic->setCalledFunction(FSHFunc);
 }
 
-static void buildUMulWithOverflowFunc(Function *UMulFunc) {
-  // The function body is already created.
-  if (!UMulFunc->empty())
+static void buildArithWithOverflowFunc(Function *Func, Intrinsic::ID ID) {
+  if (!Func->empty())
     return;
 
-  BasicBlock *EntryBB = BasicBlock::Create(UMulFunc->getParent()->getContext(),
-                                           "entry", UMulFunc);
+  BasicBlock *EntryBB =
+      BasicBlock::Create(Func->getParent()->getContext(), "entry", Func);
   IRBuilder<> IRB(EntryBB);
-  // Build the actual unsigned multiplication logic with the overflow
-  // indication. Do unsigned multiplication Mul = A * B. Then check
-  // if unsigned division Div = Mul / A is not equal to B. If so,
-  // then overflow has happened.
-  Value *Mul = IRB.CreateNUWMul(UMulFunc->getArg(0), UMulFunc->getArg(1));
-  Value *Div = IRB.CreateUDiv(Mul, UMulFunc->getArg(0));
-  Value *Overflow = IRB.CreateICmpNE(UMulFunc->getArg(0), Div);
-
-  // umul.with.overflow intrinsic return a structure, where the first element
-  // is the multiplication result, and the second is an overflow bit.
-  Type *StructTy = UMulFunc->getReturnType();
-  Value *Agg = IRB.CreateInsertValue(PoisonValue::get(StructTy), Mul, {0});
+  Value *LHS = Func->getArg(0);
+  Value *RHS = Func->getArg(1);
+
+  Value *Result;
+  Value *Overflow;
+  Type *StructTy = Func->getReturnType();
+
+  switch (ID) {
+  case Intrinsic::smul_with_overflow:
+    Result = IRB.CreateNSWMul(LHS, RHS);
+    Overflow = IRB.CreateICmpNE(IRB.CreateSDiv(Result, LHS), RHS);
+    break;
+  case Intrinsic::umul_with_overflow:
+    Result = IRB.CreateNUWMul(LHS, RHS);
+    Overflow = IRB.CreateICmpNE(IRB.CreateUDiv(Result, LHS), RHS);
+    break;
+  case Intrinsic::sadd_with_overflow:
+    // TODO: Implement using OpIAddCarry
+    Result = IRB.CreateNSWAdd(LHS, RHS);
+    // Overflow if (LHS > 0 && RHS > 0 && Result < 0) || (LHS < 0 && RHS < 0 &&
+    // Result > 0)
+    Overflow = IRB.CreateOr(
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSGT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSGT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSLT(Result, IRB.getInt32(0))),
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSLT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSLT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSGT(Result, IRB.getInt32(0))));
+    break;
+  case Intrinsic::uadd_with_overflow:
+    Result = IRB.CreateNUWAdd(LHS, RHS);
+    // Overflow occurs if the result is less than either of the operands.
+    Overflow = IRB.CreateICmpULT(Result, LHS);
+    break;
+  case Intrinsic::ssub_with_overflow:
+    Result = IRB.CreateNSWSub(LHS, RHS);
+    // Overflow if (LHS < 0 && RHS > 0 && Result > 0) || (LHS > 0 && RHS < 0 &&
+    // Result < 0)
+    Overflow = IRB.CreateOr(
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSLT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSGT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSGT(Result, IRB.getInt32(0))),
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSGT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSLT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSLT(Result, IRB.getInt32(0))));
+    break;
+  case Intrinsic::usub_with_overflow:
+    Result = IRB.CreateNUWSub(LHS, RHS);
+    // Overflow occurs if the result is greater than the left-hand-side operand.
+    Overflow = IRB.CreateICmpUGT(Result, LHS);
+    break;
+
+  default:
+    llvm_unreachable("Unsupported arithmetic with overflow intrinsic.");
+  }
+
+  Value *Agg = IRB.CreateInsertValue(PoisonValue::get(StructTy), Result, {0});
   Value *Res = IRB.CreateInsertValue(Agg, Overflow, {1});
   IRB.CreateRet(Res);
 }
@@ -407,18 +451,17 @@ static bool toSpvOverloadedIntrinsic(IntrinsicInst *II, Intrinsic::ID NewID,
   return true;
 }
 
-static void lowerUMulWithOverflow(IntrinsicInst *UMulIntrinsic) {
+static void lowerArithWithOverflow(IntrinsicInst *ArithIntrinsic) {
   // Get a separate function - otherwise, we'd have to rework the CFG of the
   // current one. Then simply replace the intrinsic uses with a call to the new
   // function.
-  Module *M = UMulIntrinsic->getModule();
-  FunctionType *UMulFuncTy = UMulIntrinsic->getFunctionType();
-  Type *FSHLRetTy = UMulFuncTy->getReturnType();
-  const std::string FuncName = lowerLLVMIntrinsicName(UMulIntrinsic);
-  Function *UMulFunc =
-      getOrCreateFunction(M, FSHLRetTy, UMulFuncTy->params(), FuncName);
-  buildUMulWithOverflowFunc(UMulFunc);
-  UMulIntrinsic->setCalledFunction(UMulFunc);
+  Module *M = ArithIntrinsic->getModule();
+  FunctionType *FuncTy = ArithIntrinsic->getFunctionType();
+  Type *RetTy = FuncTy->getReturnType();
+  const std::string FuncName = lowerLLVMIntrinsicName(ArithIntrinsic);
+  Function *Func = getOrCreateFunction(M, RetTy, FuncTy->params(), FuncName);
+  buildArithWithOverflowFunc(Func, ArithIntrinsic->getIntrinsicID());
+  ArithIntrinsic->setCalledFunction(Func);
 }
 
 // Substitutes calls to LLVM intrinsics with either calls to SPIR-V intrinsics
@@ -444,8 +487,13 @@ bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
         lowerFunnelShifts(II);
         Changed = true;
         break;
+      case Intrinsic::sadd_with_overflow:
+      case Intrinsic::smul_with_overflow:
+      case Intrinsic::ssub_with_overflow:
+      case Intrinsic::uadd_with_overflow:
       case Intrinsic::umul_with_overflow:
-        lowerUMulWithOverflow(II);
+      case Intrinsic::usub_with_overflow:
+        lowerArithWithOverflow(II);
         Changed = true;
         break;
       case Intrinsic::assume:
diff --git a/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp b/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
index 52fc6f33b4ef1..845113dd48650 100644
--- a/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
@@ -177,11 +177,11 @@ void SPIRVPassConfig::addIRPasses() {
 
   TargetPassConfig::addIRPasses();
   addPass(createSPIRVRegularizerPass());
-  addPass(createSPIRVPrepareFunctionsPass(TM));
   addPass(createSPIRVStripConvergenceIntrinsicsPass());
 }
 
 void SPIRVPassConfig::addISelPrepare() {
+  addPass(createSPIRVPrepareFunctionsPass(TM));
   addPass(createSPIRVEmitIntrinsicsPass(&getTM<SPIRVTargetMachine>()));
   TargetPassConfig::addISelPrepare();
 }
diff --git a/llvm/test/CodeGen/SPIRV/assume.ll b/llvm/test/CodeGen/SPIRV/assume.ll
index fbf12ef184a89..88ff74535bde7 100644
--- a/llvm/test/CodeGen/SPIRV/assume.ll
+++ b/llvm/test/CodeGen/SPIRV/assume.ll
@@ -1,7 +1,7 @@
-; RUN: llc -mtriple=spirv32-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
-; RUN: llc -mtriple=spirv64-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
-; RUN: llc -mtriple=spirv32-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
-; RUN: llc -mtriple=spirv64-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv64-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv64-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
 
 ; EXT:        OpCapability ExpectAssumeKHR
 ; EXT-NEXT:   OpExtension "SPV_KHR_expect_assume"
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll
new file mode 100644
index 0000000000000..b336e22a145b4
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll
@@ -0,0 +1,54 @@
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s --check-prefix=CHECK-SPIRV
+
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_8:]] "spirv.llvm_smul_with_overflow_i8"
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_32:]] "spirv.llvm_smul_with_overflow_i32"
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_VEC_I64:]] "spirv.llvm_smul_with_overflow_v2i64"
+
+define dso_local spir_func void @_Z4foo8hhPh(i8 zeroext %a, i8 zeroext %b, i8* nocapture %c) local_unnamed_addr {
+entry:
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_8]]
+  %smul = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %a, i8 %b)
+  %cmp = extractvalue { i8, i1 } %smul, 1
+  %smul.value = extractvalue { i8, i1 } %smul, 0
+  %storemerge = select i1 %cmp, i8 0, i8 %smul.value
+  store i8 %storemerge, i8* %c, align 1
+  ret void
+}
+
+define dso_local spir_func void @_Z5foo32jjPj(i32 %a, i32 %b, i32* nocapture %c) local_unnamed_addr {
+entry:
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_32]]
+  %smul = tail call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %b, i32 %a)
+  %smul.val = extractvalue { i32, i1 } %smul, 0
+  %smul.ov = extractvalue { i32, i1 } %smul, 1
+  %spec.select = select i1 %smul.ov, i32 0, i32 %smul.val
+  store i32 %spec.select, i32* %c, align 4
+  ret void
+}
+
+define dso_local spir_func void @smulo_v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64>* %p) nounwind {
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_VEC_I64]]
+  %smul = call {<2 x i64>, <2 x i1>} @llvm.smul.with.overflow.v2i64(<2 x i64> %a, <2 x i64> %b)
+  %smul.val = extractvalue {<2 x i64>, <2 x i1>} %smul, 0
+  %smul.ov = extractvalue {<2 x i64>, <2 x i1>} %smul, 1
+  %zero = alloca <2 x i64>, align 16
+  %spec.select = select <2 x i1> %smul.ov, <2 x i64> <i64 0, i64 0>, <2 x i64> %smul.val
+  store <2 x i64> %spec.select, <2 x i64>* %p
+  ret void
+}
+
+; CHECK-SPIRV: %[[#NAME_SMUL_FUNC_8]] = OpFunction %[[#]]
+; CHECK-SPIRV: %[[#VAR_A:]] = OpFunctionParameter %[[#]]
+; CHECK-SPIRV: %[[#VAR_B:]] = OpFunctionParameter %[[#]]
+; CHECK-SPIRV: %[[#MUL_RES:]] = OpIMul %[[#]] %[[#VAR_A]] %[[#VAR_B]]
+; CHECK-SPIRV: %[[#DIV_RES:]] = OpSDiv %[[#]] %[[#MUL_RES]] %[[#VAR_A]]
+; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#DIV_RES]] %[[#VAR_B]]
+; CHECK-SPIRV: %[[#INSERT_RES:]] = OpCompositeInsert %[[#]] %[[#MUL_RES]]
+; CHECK-SPIRV: %[[#INSERT_RES_1:]] = OpCompositeInsert %[[#]] %[[#CMP_RES]] %[[#INSERT_RES]]
+; CHECK-SPIRV: OpReturnValue %[[#INSERT_RES_1]]
+
+declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8)
+
+declare { i32, i1 } @llvm.smul.with.overflow.i32(i32, i32)
+
+declare {<2 x i64>, <2 x i1>} @llvm.smul.with.overflow.v2i64(<2 x i64>, <2 x i64>)
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
index 406a23fa7d3df..7a4137b875fd2 100644
--- a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
@@ -42,7 +42,7 @@ define dso_local spir_func void @umulo_v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i6
 ; CHECK-SPIRV: %[[#VAR_B:]] = OpFunctionParameter %[[#]]
 ; CHECK-SPIRV: %[[#MUL_RES:]] = OpIMul %[[#]] %[[#VAR_A]] %[[#VAR_B]]
 ; CHECK-SPIRV: %[[#DIV_RES:]] = OpUDiv %[[#]] %[[#MUL_RES]] %[[#VAR_A]]
-; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#VAR_A]] %[[#DIV_RES]]
+; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#DIV_RES]] %[[#VAR_B]]
 ; CHECK-SPIRV: %[[#INSERT_RES:]] = OpCompositeInsert %[[#]] %[[#MUL_RES]]
 ; CHECK-SPIRV: %[[#INSERT_RES_1:]] = OpCompositeInsert %[[#]] %[[#CMP_RES]] %[[#INSERT_RES]]
 ; CHECK-SPIRV: OpReturnValue %[[#INSERT_RES_1]]
diff --git a/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll b/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll
new file mode 100644
index 0000000000000..054520d2021b9
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll
@@ -0,0 +1,18 @@
+; RUN: llc -O2 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s
+; RUN: %if spirv-tools %{ llc -O2 -mtriple=spirv32-unknown-unknown %s -o - -filetype=obj | spirv-val %}
+
+; CHECK: %[[#INT16:]] = OpTypeInt 16 0
+; CHECK: %[[#PARAM:]] = OpFunctionParameter %[[#INT16]]
+; CHECK: OpSwitch %[[#PARAM]] %[[#]] 1 %[[#]] 2 %[[#]]
+
+define i32 @test_switch(i16 %cond) {
+entry:
+  switch i16 %cond, label %default [ i16 1, label %case_one
+                                     i16 2, label %case_two ]
+case_one:
+  ret i32 1
+case_two:
+  ret i32 2
+default:
+  ret i32 3
+}

efriedma-quic

Could you clarify why you need to do this? The normal flow is that GlobalISel converts the intrinsic call to a generic instruction (G_SMULO etc.), and then LegalizeHelper converts that instruction into a legal instruction.

arsenm

This should just go through the regular legalization process. You don't need to invent your own IR level expansion

arsenm · 2024-06-11T16:49:42Z

llvm/lib/Target/SPIRV/SPIRVISelLowering.h

+  MVT getPreferredSwitchConditionType(LLVMContext &Context,
+                                      EVT ConditionVT) const override {
+    return ConditionVT.getSimpleVT();
+  }


Unrelated change

This PR is based on another (PR 94959) which hasn't been merged yet. Once the other PR is merged this one will be rebased and this change will disappear. This was done so that our automated testing combines these changes.

michalpaszkowski · 2024-06-11T17:19:11Z

@efriedma-quic @arsenm I do agree that this should not be needed and ideally the intrinsics should be covered in the standard GlobalISel flow, but we need those expanded prior to SPIRVEmitIntrinsics pass for special handling we have there for aggregates. Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

michalpaszkowski · 2024-06-11T17:26:08Z

The major (sole) role of the SPIRVPrepareFunctions pass is to prepare and expose all aggregate related instructions and intrinsics to SPIRVEmitIntrinsics pass where:

// This pass performs the following transformation on LLVM IR level required
// for the following translation to SPIR-V:
// - replaces direct usages of aggregate constants with target-specific
//   intrinsics;
// - replaces aggregates-related instructions (extract/insert, ld/st, etc)
//   with a target-specific intrinsics;

arsenm · 2024-06-11T17:26:29Z

for special handling we have there for aggregates.

These shouldn't need treatment as aggregates, they will IRTranslate to a return of tuple?

michalpaszkowski · 2024-06-11T17:29:20Z

for special handling we have there for aggregates.

These shouldn't need treatment as aggregates, they will IRTranslate to a return of tuple?

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

efriedma-quic · 2024-06-11T19:14:39Z

Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

I'd prefer to solve this sooner rather than later. I'm not so much worried about the size of this particular patch, but this seems like it's going to be a continual maintenance burden for target-independent changes.

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

Probably the SPIRV backend should be doing calling convention lowering inside GlobalISel, instead of trying to do it as a pre-pass. Calling convention lowering is all target-specific anyway, so you can generate whatever you need.

michalpaszkowski · 2024-06-11T19:37:43Z

Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

I'd prefer to solve this sooner rather than later. I'm not so much worried about the size of this particular patch, but this seems like it's going to be a continual maintenance burden for target-independent changes.

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

Probably the SPIRV backend should be doing calling convention lowering inside GlobalISel, instead of trying to do it as a pre-pass. Calling convention lowering is all target-specific anyway, so you can generate whatever you need.

I do agree on the high level, there might be some additional details which I do not remember right now, but I will make sure to experiment with this a bit more and discuss this in the upcoming SPIR-V backend sync up on Monday. Though, the direction and goal is to address this issue. Also simplification here would only benefit us going forward.

For this particular patch, if possible, let's continue with the expansion in the pre-IRTranslation. I will hold on with merging this patch by the end of the day if there are any additional comments.

Still, the implementation will need to be changed in the next patch to use native SPIR-V OpIAddCarry and OpISubBorrow.

efriedma-quic · 2024-06-12T03:16:57Z

If it would be significantly more convenient for you to land this, then look at the underlying issue, I guess I won't block that. But I'd consider fixing it properly a blocker for exiting "experimental" status.

michalpaszkowski · 2024-06-12T20:43:15Z

This particular case (llvm.x.with.overflow intrinsics) probably can be reworked in entirety using native SPIR-V instructions (coming in another patch as this requires further testing with our non-standard compliant SPIR-V consumer). Given that we have 1:1 mapping we won't need to do anything related to aggregates prior to IRTranslation for these builtins.

However, the majority of logic in SPIRVPrepareFunctions and SPIRVEmitIntrinsics related to handling aggregates is not so simple to be reworked -- if this is what you mean -- and I am not even entirely sure it is possible to handle some cases after IRTranslation without too much inefficiency and unnecessary complexity.

@efriedma-quic Could you please elaborate what do you consider exactly to be a blocker for exiting "experimental" status? Also, would you be available to join our SPIR-V backend sync up meeting this upcoming Monday to discuss this? Alternatively we could discuss this over email if this would work better. We would gladly take any tips on how we could rework our pre-IRTranslation passes. SPIR-V is really on similar level as LLVM IR and things that apply to other targets do not fit as easily here.

CC: @VyacheslavLevytskyy @iliya-diyachkov

efriedma-quic · 2024-06-12T21:05:14Z

Could you please elaborate what do you consider exactly to be a blocker for exiting "experimental" status?

Specifically, the fact that it's impossible for an intrinsic with multiple results to reach GlobalISel.

The reason why we want SPRIV lowering to go through GlobalISel in the first place is so that we can take advantage of shared code: legalization and lowering optimizations don't need to be specifically coded for SPIRV. If a large fraction of intrinsics can't ever be lowered to GlobalISel operations, we lose that: those intrinsics will all need to be special-cased in SPIRV-specific code.

I expect the simplest way to make that would would be to just reconstruct aggregates inside GlobalISel... but if you can come up with some other approach to exclude intrinsics from aggregate lowering, that's okay, I guess.

efriedma-quic · 2024-06-12T21:15:29Z

I'll plan to attend the meeting next Monday to discuss.

michalpaszkowski added 2 commits June 10, 2024 03:51

llvmbot added the backend:SPIR-V label Jun 10, 2024

michalpaszkowski requested a review from VyacheslavLevytskyy June 10, 2024 17:54

VyacheslavLevytskyy approved these changes Jun 10, 2024

View reviewed changes

efriedma-quic requested a review from arsenm June 10, 2024 22:30

efriedma-quic requested changes Jun 10, 2024

View reviewed changes

arsenm reviewed Jun 11, 2024

View reviewed changes

VyacheslavLevytskyy mentioned this pull request Sep 16, 2024

[SPIR-V] Allow intrinsics with aggregate return type to reach GlobalISel #108893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

michalpaszkowski commented Jun 10, 2024

llvmbot commented Jun 10, 2024

efriedma-quic left a comment

arsenm left a comment

arsenm Jun 11, 2024

michalpaszkowski Jun 11, 2024

michalpaszkowski commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024

arsenm commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024 •

edited

Loading

efriedma-quic commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024

efriedma-quic commented Jun 12, 2024

michalpaszkowski commented Jun 12, 2024 •

edited

Loading

efriedma-quic commented Jun 12, 2024

efriedma-quic commented Jun 12, 2024

[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

Are you sure you want to change the base?

[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

Conversation

michalpaszkowski commented Jun 10, 2024

llvmbot commented Jun 10, 2024

efriedma-quic left a comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

arsenm Jun 11, 2024

Choose a reason for hiding this comment

michalpaszkowski Jun 11, 2024

Choose a reason for hiding this comment

michalpaszkowski commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024

arsenm commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024 • edited Loading

efriedma-quic commented Jun 11, 2024

michalpaszkowski commented Jun 11, 2024

efriedma-quic commented Jun 12, 2024

michalpaszkowski commented Jun 12, 2024 • edited Loading

efriedma-quic commented Jun 12, 2024

efriedma-quic commented Jun 12, 2024

michalpaszkowski commented Jun 11, 2024 •

edited

Loading

michalpaszkowski commented Jun 12, 2024 •

edited

Loading